Multi-Document Summarization from First Principles

نویسنده

  • William M. Darling
چکیده

We present SumBasic+, a powerful multi-document summarization system built from first principles. SumBasic+ is designed as a baseline system to gauge the level of summarization results we could obtain using simple statistical techniques. Our extractive summarization system is based on word frequency statistics similar to the SumBasic method. Nevertheless, we were able to considerably improve its summarization performance by tuning the amount and type of redundancy removal performed, adding a simple query-focused summarization component, and by employing a number of preand post-processing compression techniques. The resulting system, SumBasic+, is a strong baseline system that is ideal for comparing with new summarization approaches, as it principally uses existing techniques and performs surprisingly well. Of 43 competing systems in the TAC 2010 summarization track, our system achieved fourth and third place in R-2 and R-SU4 ROUGE scores respectively, and second overall in the manual average pyramid evaluation for the initial summaries. Keywords-Text processing; Artificial intelligence

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

A Survey on Multi-Document Summarization

Multi-document summarization aims at delivering the majority of information content from multiple documents using much less lengthy texts, usually a short paragraph of several hundred words. This paper surveys several different approaches to multi-document summarization by first building a unified high level view of the multi-document summarization problem, and then comparing different approach...

متن کامل

Multi-Document Abstractive Summarization Using ILP Based Multi-Sentence Compression

Abstractive summarization is an ideal form of summarization since it can synthesize information from multiple documents to create concise informative summaries. In this work, we aim at developing an abstractive summarizer. First, our proposed approach identifies the most important document in the multi-document set. The sentences in the most important document are aligned to sentences in other ...

متن کامل

Multi-Document Summarization using Automatic Key-Phrase Extraction

The development of a multi-document summarizer using automatic key-phrase extraction has been described. This summarizer has two main parts; first part is automatic extraction of Key-phrases from the documents and second part is automatic generation of a multidocument summary based on the extracted key-phrases. The CRF based Automatic Keyphrase extraction system has been used here. A document g...

متن کامل

AllSummarizer system at MultiLing 2015: Multilingual single and multi-document summarization

In this paper, we evaluate our automatic text summarization system in multilingual context. We participated in both single document and multi-document summarization tasks of MultiLing 2015 workshop. Our method involves clustering the document sentences into topics using a fuzzy clustering algorithm. Then each sentence is scored according to how well it covers the various topics. This is done us...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010